Color, Size, Shape, Transparency, and Linetype
By the end of this lecture, you will be able to:
Aesthetics bring plots to life: color, size, shape, transparency, linetype
aes() maps variables to these aesthetics, linking data to visual features.Aesthetics highlight comparisons and guide attention
These are the simple geoms that we covered previously.
geom_pointgeom_colgeom_bargeom_textgeom_lineThis is what they are used for:
geom_pointgeom_colgeom_bargeom_textgeom_line| geom | Use for |
|---|---|
geom_point() |
Relationships between two variables; at least 10 obs. |
geom_col() |
Totals/percent per category; ordered bars. Uses pre-computed values for bar height. You must supply both x and y |
geom_bar() |
Counts the observations in each category. You must supply x, not y |
geom_text() |
Direct labels for small N; annotate outliers |
geom_line() |
Relationships between two variables; at least 10 obs. |
Let’s explore how aesthetics change meaning and clarity.
color(discrete)color(continuous)size(discrete + continuous)fill(discrete + continuous)shape(discrete)alpha(discrete + continuous)This is what they are used for:
color(discrete)color(continuous)size(discrete + continuous)fill(discrete + continuous)shape(discrete)alpha(discrete + continuous)| Aesthetic | Use for |
|---|---|
color (discrete) |
Differentiate categories with distinct hues |
color (continuous) |
Show gradual change or intensity across a numeric scale |
size (discrete + continuous) |
Represent magnitude, frequency, or importance; best with moderate differences |
fill (discrete + continuous) |
Similar to color but applies to filled shapes (bars, areas, points with pch=21–25) |
shape (discrete) |
Distinguish categories when colors alone aren’t enough; limited shapes available |
alpha (discrete + continuous) |
Control transparency to show overlap/density; reduce clutter in crowded plots |
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
| democracy | gdp |
|---|---|
| -8 | 2 |
| -7 | 9 |
| -5 | 4 |
| -3 | 7 |
| 0 | 8 |
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
Let’s imagine that we have a new dataset with more variables:
# Reusable dataset
eg1 <- data.frame(
country = c(
"North Korea", # very autocratic, very poor
"Saudi Arabia", # autocratic, but richer due to oil
"Zimbabwe", # authoritarian, low GDP
"Russia", # hybrid regime, middle income
"Nigeria", # similar position
"India", # low–mid democracy, growing GDP
"Brazil", # democracy, mid GDP
"Poland", # consolidated democracy, higher GDP
"South Korea" # rich democracy
),
democracy = c(-8, -7, -5, -3, 0, 2, 5, 8, 9), # democracy score
gdp = c(2, 9, 4, 7, 8, 20, 15, 25, 27), # GDP per capita ($1,000s)
region = c("Asia", "Asia", "Africa",
"Europe", "Africa", "Asia",
"Americas", "Europe", "Asia"), # categorical
population = c(5, 50, 30, 12, 80, 60, 40, 100, 70), # continuous (millions)
income_group = factor(
c("Low", "Low", "Low",
"Middle", "Middle", "Middle",
"High", "High", "High"),
levels = c("Low", "Middle", "High")),
corruption = c(80, 65, 50, 40, 35, 30, 25, 20, 15))Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
Let’s imagine that we have a new dataset with more variables:
| country | democracy | gdp | region | population | income_group | corruption |
|---|---|---|---|---|---|---|
| North Korea | -8 | 2 | Asia | 5 | Low | 80 |
| Saudi Arabia | -7 | 9 | Asia | 50 | Low | 65 |
| Zimbabwe | -5 | 4 | Africa | 30 | Low | 50 |
| Russia | -3 | 7 | Europe | 12 | Middle | 40 |
| Nigeria | 0 | 8 | Africa | 80 | Middle | 35 |
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
This is how we can emphasize the region (discrete):
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
This is how we can emphasize the corruption levels (continuous):
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
This is how we can emphasize the income groups (discrete):
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
This is how we can emphasize population size (continuous)
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
This is how we can emphasize income groups (discrete).
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
This is how we can emphasize the level of corruption (continuous).
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
aes()aes()This is how we create the data:
aes()aes()This is what it looks like:
| x | y | group |
|---|---|---|
| 1 | 2 | A |
| 2 | 4 | B |
| 3 | 6 | A |
| 4 | 8 | B |
| 5 | 10 | A |
aes()aes()This is the difference:
Why it’s wrong:
"red" treated as a category, so ggplot makes a useless legend."red" is just a style — no legend clutter.color(discrete)color(continuous)fill(discrete)fill + color (black outline)fill(continuous)alpha(discrete + continuous)color(discrete)color(continuous)fill(discrete)fill + color (black outline)fill(continuous)alpha(discrete + continuous)| Aesthetic | Use for |
|---|---|
color (discrete) |
Differentiate categories with distinct outline hues (rare for bars) |
color (continuous) |
Show gradual change or intensity in outline scale (rare for bars) |
fill (discrete) |
Fill bars by category with distinct hues (common) |
fill + color (outline) |
Combine interior fill with a contrasting border for clarity |
fill (continuous) |
Show gradual change or intensity across a numeric fill scale (rare for bars) |
alpha (discrete + continuous) |
Control transparency to show emphasis or reduce clutter (rare for bars) |
Suppose we have survey responses by education level. We’ll compare the number of respondents in each category.
| education |
|---|
| Primary |
| Primary |
| High School |
| High School |
Suppose we have survey responses by education level. We’ll compare the number of respondents in each category.
| education | n |
|---|---|
| College | 4 |
| High School | 3 |
| Primary | 2 |
Suppose we have survey responses by education level. We’ll compare the number of respondents in each category.
Suppose we have survey responses by education level. We’ll compare the number of respondents in each category.
Suppose we have survey responses by education level. We’ll compare the number of respondents in each category.
color(discrete)color(continuous)size - deprecatedlinewidth(discrete + continuous)linetype(discrete)alpha(discrete+continuous)color(discrete)color(continuous)size - deprecatedlinewidth(discrete + continuous)linetype(discrete)alpha(discrete+continuous)| Aesthetic | Use for |
|---|---|
color (discrete) |
Distinguish groups with different line colors |
color (continuous) |
Show gradient along x or y values; possible but unusual (rare for lines) |
size |
Deprecated; replaced by linewidth |
linewidth (discrete + continuous) |
Vary line thickness to emphasize magnitude (sometimes used) or weight (rare for lines) |
linetype (discrete) |
Differentiate groups with solid/dashed/dotted styles |
alpha (discrete + continuous) |
Control transparency to reduce clutter when many lines overlap (rare for lines) |
Suppose we have data on average voter turnout (%) in national elections over several years. We want to see the trend in participation.
| year | turnout |
|---|---|
| 2000 | 55 |
| 2004 | 58 |
| 2008 | 62 |
| 2012 | 60 |
| 2016 | 59 |
| 2020 | 65 |
Suppose we have data on average voter turnout (%) in national elections over several years. We want to see the trend in participation.
Suppose we have data on average voter turnout (%) in national elections over several years for the US and the UK. We want to see the trend in participation.
# Toy dataset with US and UK
eg5 <- data.frame(
year = rep(c(2000, 2004, 2008, 2012, 2016, 2020), times = 2),
turnout = c(
# US presidential elections
54, 60, 62, 58, 56, 65,
# UK general elections (closest years aligned to US election years for teaching)
59, 61, 65, 66, 68, 67
),
country = rep(c("United States", "United Kingdom"), each = 6)
)Suppose we have data on average voter turnout (%) in national elections over several years for the US and the UK. We want to see the trend in participation.
| year | turnout | country |
|---|---|---|
| 2000 | 54 | United States |
| 2004 | 60 | United States |
| 2008 | 62 | United States |
| 2012 | 58 | United States |
| 2016 | 56 | United States |
| 2020 | 65 | United States |
| 2000 | 59 | United Kingdom |
| 2004 | 61 | United Kingdom |
Suppose we have data on average voter turnout (%) in national elections over several years for the US and the UK. We want to see the trend in participation.
Suppose we have data on average voter turnout (%) in national elections over several years for the US and the UK. We want to see the trend in participation.
Suppose we have data on average voter turnout (%) in national elections over several years for the US and the UK. We want to see the trend in participation.
Suppose we have data on average voter turnout (%) in national elections over several years for the US and the UK. We want to see the trend in participation.
Suppose we have data on average voter turnout (%) in national elections over several years for the US and the UK. We want to see the trend in participation.
| Geom | Useful aesthetics | Avoid |
|---|---|---|
| Boxplot | fill, color | size, shape |
| Histogram | fill, alpha | shape |
| Density | color, linetype | size |
| Violin | fill, color | shape |
| Smooth | color, linetype | size |
| SF (maps) | fill, color | shape |
geom_boxplotSuppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to compare typical values and how spread out the answers are for men and women.
geom_boxplotSuppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to compare typical values and how spread out the answers are for men and women.
| gender | trust |
|---|---|
| Men | 3.879049 |
| Men | 4.539645 |
| Men | 8.117417 |
| Men | 5.141017 |
| Men | 5.258576 |
| Men | 8.430130 |
| Men | 5.921832 |
| Men | 2.469877 |
geom_boxplotSuppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to compare typical values and how spread out the answers are for men and women.
geom_boxplot: fill(color)Suppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to compare typical values and how spread out the answers are for men and women.
geom_histogram: fill + alphaSuppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to compare typical values and how spread out the answers are for men and women.
geom_sf fill(color)library(ggplot2)
library(sf)
library(rnaturalearth)
library(rnaturalearthdata)
world <- ne_countries(scale = "medium",
returnclass = "sf")
europe_bounds <- list(x = c(-10, 40),
y = c(35, 70))
# Mapping it
ggplot() +
geom_sf(data = world, aes(fill=log(pop_est))) +
coord_sf(xlim = europe_bounds$x,
ylim = europe_bounds$y)On the most fundamental level, we need to use the right colors for our visualizations
This is relevant for:
8% of men and 0.5% of women have some form of color blindness
Thus, colors should be distinguishable by people with different forms of color blindness
The Viridis palette in R allows us to create color-blind-friendly graphs
These are predefined palettes that are widely used.
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
Let’s imagine that we have a new dataset with more variables:
# Reusable dataset
eg1 <- data.frame(
country = c(
"North Korea", # very autocratic, very poor
"Saudi Arabia", # autocratic, but richer due to oil
"Zimbabwe", # authoritarian, low GDP
"Russia", # hybrid regime, middle income
"Nigeria", # similar position
"India", # low–mid democracy, growing GDP
"Brazil", # democracy, mid GDP
"Poland", # consolidated democracy, higher GDP
"South Korea" # rich democracy
),
democracy = c(-8, -7, -5, -3, 0, 2, 5, 8, 9), # democracy score
gdp = c(2, 9, 4, 7, 8, 20, 15, 25, 27), # GDP per capita ($1,000s)
region = c("Asia", "Asia", "Africa",
"Europe", "Africa", "Asia",
"Americas", "Europe", "Asia"), # categorical
population = c(5, 50, 30, 12, 80, 60, 40, 100, 70), # continuous (millions)
income_group = factor(
c("Low", "Low", "Low",
"Middle", "Middle", "Middle",
"High", "High", "High"),
levels = c("Low", "Middle", "High")),
corruption = c(80, 65, 50, 40, 35, 30, 25, 20, 15))Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
Let’s imagine that we have a new dataset with more variables:
| country | democracy | gdp | region | population | income_group | corruption |
|---|---|---|---|---|---|---|
| North Korea | -8 | 2 | Asia | 5 | Low | 80 |
| Saudi Arabia | -7 | 9 | Asia | 50 | Low | 65 |
| Zimbabwe | -5 | 4 | Africa | 30 | Low | 50 |
| Russia | -3 | 7 | Europe | 12 | Middle | 40 |
| Nigeria | 0 | 8 | Africa | 80 | Middle | 35 |
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
geom_sf fill(color) - No Viridislibrary(ggplot2)
library(sf)
library(rnaturalearth)
library(rnaturalearthdata)
world <- ne_countries(scale = "medium",
returnclass = "sf")
europe_bounds <- list(x = c(-10, 40),
y = c(35, 70))
# Mapping it
ggplot() +
geom_sf(data = world, aes(fill=log(pop_est))) +
coord_sf(xlim = europe_bounds$x,
ylim = europe_bounds$y) +
labs(title = "European Countries")geom_sf fill(color) - Viridislibrary(ggplot2)
library(sf)
library(rnaturalearth)
library(rnaturalearthdata)
world <- ne_countries(scale = "medium",
returnclass = "sf")
europe_bounds <- list(x = c(-10, 40),
y = c(35, 70))
# Mapping it
ggplot() +
geom_sf(data = world, aes(fill=log(pop_est))) +
coord_sf(xlim = europe_bounds$x,
ylim = europe_bounds$y) +
labs(title = "European Countries")+
scale_fill_viridis_c(option = "viridis") # you can also try "magma", "inferno", "cividis", etc.What do you see?
How does the story change?
What does adding size tell us?
# color (continuous): encode a numeric gradient (low → high)
# size (continuous): encode magnitude with point size
ggplot(eg1, aes(x = democracy, y = gdp, color=corruption, size=population)) +
geom_point() +
scale_color_viridis_c(option = "viridis") # you can also try "magma", "inferno", "cividis", etc.What do you see?
democracy ↔︎ gdp
How does the story change?
Corruption and Institutions matter
What does adding size tell us?
# color (continuous): encode a numeric gradient (low → high)
# size (continuous): encode magnitude with point size
ggplot(eg1, aes(x = democracy, y = gdp, color=corruption, size=population)) +
geom_point() +
scale_color_viridis_c(option = "viridis") # you can also try "magma", "inferno", "cividis", etc.Population weights the story: more populous countries also have higher GDP and are more democratic.
Popescu (JCU): Data Visualization 2